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Abstract This paper examines the reliability of the methods used to capture homicide 
events committed by far-right extremists in a number of open source terrorism data 
sources. Although the number of research studies that use open source data to examine 
terrorism has grown dramatically in the last 10 years, there has yet to be a study that 
examines issues related to selectivity bias. After reviewing limitations of existing terrorism 
studies and the major sources of data on terrorism and violent extremist criminal activity, 
we compare the estimates of these homicide events from 10 sources used to create the 
United States Extremist Crime Database (ECDB). We document incidents that sources 
either incorrectly exclude or include based upon their inclusion criteria. We use a 
“catchment-re-catchment” analysis and find that the inclusion of additional sources result 
in decreasing numbers of target events not identified in previous sources and a steadily 
increasing number of events that were identified in any of the previous data sources. This 
finding indicates that collectively the sources are approaching capturing the universe of 
eligible events. Next, we assess the effects of procedural differences on these estimates. 
We find considerable variation in the number of events captured by sources. Sources 
include some events that are contrary to their inclusion criteria and exclude others that 
meet their criteria. Importantly, though, the attributes of victim, suspect, and incident 
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characteristics are generally similar across data source. This finding supports the notion 
that scholars using open-source data are using data that is representative of the larger 
universe they are interested in. The implications for terrorism and open source research are 
discussed. 


Keywords Terrorism - Far-right violence - Selectivity bias 


Introduction 


Since the terrorist attacks of September 11, 2001, policy and scholarly interest in terrorism 
has grown and the number of studies published on this topic has increased. Silke (2008: 28) 
estimates that terrorism research published post 9/11 represents close to 90% of published 
studies and that a new terrorism book is published every 6 h. There has been an increase in 
the application of sophisticated statistical methods to study terrorism. Examples include 
Dugan et al. (2005) study that combines data from the Federal Aviation Administration, 
RAND, and the Global Terrorism Database (GTD) to examine the impact of counterter- 
rorism interventions on hijacking using continuous-time survival analysis and logistic 
regression, and Smith and Damphousse’s (1998) study that uses structural equation 
modeling to understand the sentencing of terrorists (see also, Johnson and Braithewaite 
2009; Townsley et al. 2008). The other articles in this volume use other cutting-edge 
statistical techniques, such as latent class growth analysis and cross-classified multilevel 
models, to study terrorism issues. Although these contributions illustrate the importance of 
using rigorous designs and advanced statistical techniques it is equally important to 
investigate the nature and quality of data used to produce these sophisticated models. The 
application of any statistical method is only meaningful when researchers understand the 
strengths and weakness of their data source so that caveats can be provided and errors 
corrected. 

This paper looks at what occurs early in the research process by examining the reli- 
ability of the methods used to create a database of terrorism incidents from open source 
data. Even though studies have investigated such issues in related disciplines (e.g., com- 
parisons of data sources used to study “street” crime, hate crimes, Super-max correctional 
institutions and the number of extremist groups in a state (Biderman and Lynch 1991; 
Freilich and Pridemore 2006; Green et al. 2001; Lynch and Addington 2007; Naday et al. 
2008), no study has compared terrorism databases to explore their error structure. We 
begin to fill this gap by focusing on one particular crime type—homicides committed by at 
least one far-right extremist—and comparing how well 10 terrorism data sets and/or sources 
do in capturing these events. As discussed more fully below, we limit our comparison to 
homicide events because they are more likely (compared to non-fatal attacks) to be cap- 
tured by open sources. In other words, our comparison concentrates on a crime type that is 
likely to be picked up by open source data sets and chronologies. 

We examine the procedures used in collecting information on fatal ideologically 
motivated attacks by far-right extremists in the United States for ten data sources including 
the definitions of terrorism and/or extremist violent criminal activity, the procedures used 
to identify events, and the application of each source’s inclusion criteria to document 
incidents that were either incorrectly included or excluded based upon the sources own 
inclusion criteria. We assess the effects of these definitions and procedures on the number 
and types of ideologically motivated far-right homicide events captured by these sources as 
well as characteristics of these events. 
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We first outline the limitations of the existing terrorism knowledge base and discuss 
why researchers have turned to open sources to create event-level databases. Second, we 
describe the recently created United States Extremist Crime Database (ECDB), including 
why and how it was built. We review ten specific data sources used by the ECDB to 
identify homicides committed by far-right extremists in the United States between 1990 
and 2008. Third, we explain why our comparison examines homicide incidents committed 
by far-right extremists. Fourth, after taking account of differences in definitions and each 
source’s inclusion criteria, we compare the estimates of events from each source and 
document incidents that sources either incorrectly excluded or included based upon their 
criteria. We use a “catchment-re-catchment” analysis to investigate if the inclusion of 
additional sources result in decreasing numbers of target events not identified by previous 
sources and a steadily increasing number of events that were identified in any of the 
previous data sources. This would indicate that we have come closer to capturing the 
universe of eligible events. 

Next, we assess the effects of procedural differences on these estimates. Using differ- 
ences in the estimates we discuss potential biases that could result if only a single source 
was used to identify incidents. We conclude with a discussion of our findings that sets forth 
an error profile that identifies each step of the database construction process and possible 
errors that could result as well as strategies to limit these potential errors. 


Weaknesses of Terrorism Research 


The study of terrorism did not begin post 9/11. Scholars from different disciplines have 
examined the etiology of terrorism, the effectiveness of countermeasures, and the ideol- 
ogies and structures of different groups for decades. Some studies have raised concerns 
about the quality of terrorism and extremist crime data (Chermak 2002; LaFree and Dugan 
2004; Freilich 2003; Freilich and Pridemore 2006; Merari 1991; Ross 1993; Silke 2001). 
Most theoretical work and hypothesis testing occurs with questionable or insufficient data 
(Hamm 2005; Merari 1991; Ross 1993) and statistical analysis is rare (Silke 2008). Lum 
et al. (2006) systematic review of over 14,000 terrorism articles published between 1971 
and 2003, found that only 3% were empirical (p. 8; see also Silke 2001). Victoroff’s (2005: 
34) review of psychological theories of terrorism, similarly concluded that there were more 
theories than empirical studies, and “even the small amount of psychological research is 
largely flawed, rarely having been based on scientific methods using normal and validated 
measures of psychological states, comparing direct examination of individuals with 
appropriate controls, and testing hypotheses with accepted statistical methods.” Finally, 
Silke’s (2008) examination of the impact of 9/11 on terrorism scholarship found that nearly 
65% of published articles were literature reviews. The use of inferential statistics only 
increased slightly, 3.3% of articles prior to 9/11-10% post 9/11. Silke concludes (p. 38): 
“Despite the improvements since 9/11, terrorism articles still lag behind other applied 
areas, and concerns must remain over the validity and reliability of many of the conclu- 
sions being made in the field.” 

There is also the perennial difficulty of establishing a terrorism definition that is uni- 
versally accepted by governments, law enforcement and scholars. A review of terrorism 
research indicates that scholars use over 100 different definitions of terrorism (Schmid 
2004). Maxwell and Chermak (2007) conclude that “defining terrorism is the longest and 
most highly contentious debate among terrorism researchers and governments.” Moreover, 
concerns have been raised about the difficulty of comparing terrorism definitions across 
place, and defining agency. In the United States, for example, the FBI, State Department 
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and the Department of Defense have different definitions of terrorism (Freilich et al. 2009a, b; 
Schmid 2004). These differences are not unreasonable because agencies may choose to 
define terrorism in a manner most appropriate to each agency’s (unique) mission. We 
discuss this issue more in-depth when we outline the inclusion criteria employed by the 
sources we used to identify incidents. 

It has been difficult for scholars to obtain data for quantitative analysis of terrorism. It is 
not possible to identify a sufficient number of terrorism events in a sample of the resi- 
dential population. In the United States, the National Crime Victimization Survey (NCVS), 
for example, focuses on household residents and whether they were victims of serious 
“street crime,” and was not designed to identify terrorism victims. Further, because ter- 
rorism events are rare even if some terrorism victims were identified the base rates would 
be low. Some studies have employed self-report data from known terrorists, but this 
research suffers from three weaknesses: (1) the samples are small (2) the interviews are 
conducted long after the individual participated in terrorism and thus suffer from retro- 
spective construction, and (3) importantly, the projects lack comparison groups, limiting 
their ability to test causal relationships. 

The usual alternative to offender (self report) and victimization surveys is to use 
archival data such as the Uniform Crime Reports (UCR) that accumulates the universe of 
crime events reported to authorities (LaFree and Dugan 2009). Obviously some terrorism 
events may not be reported to the police or recorded in these data systems because they fail 
to meet the inclusion criteria and are subsumed under another crime type. Moreover, the 
UCR is not collected on an incident basis and lacks detailed event information that would 
be needed for an incident-level analysis (LaFree and Dugan 2009). Importantly, the UCR 
was never designed to capture terrorism events (indeed, the 9/11 attacks were reported in 
separate section in the UCR report for this year). In addition, because the UCR Program 
does not collect information on crimes in federal jurisdictions, it might exclude attacks like 
the Foot Hood shooting that occur on military bases or other government properties. 

Scholars have therefore turned to other sources of terrorism data such as open source 
documents due to their increased availability. Open source data refer to information that is 
open to public. Much of this information is in electronic form and is searchable via the 
Internet. But, these are not necessarily the defining attributes of open source data. Official 
records from courts and other entities, for example, may not be in electronic form but can be 
made searchable. The data covered by this term have some similarities with secondary 
analysis since much of this information was compiled for purposes other than research. 
Books, newspaper articles, official records and magazine articles are some common types of 
open source data. Since secondary data are not collected for research purposes, it often lacks 
systematic and uniform procedures to ensure the reliability of the data as well as docu- 
mentation describing the decisions made in collecting and preparing the data. Newspapers, 
for example, do not act to ensure that their reporters use similar definitions writing stories on 
the same topic and that they report the same characteristics of people and events. 

It is because open source data lacks the traditional procedures used in science to ensure 
reliability and representativeness, that social scientists suspect that it is susceptible to many 
forms of error and specifically selection bias. We suspect that open source data collection 
efforts that rely on newspaper stories may over represent spectacular cases of terrorism or 
those perpetrated by certain groups. The result is potentially biased coefficients and other 
misleading results. Some open sources of data may include a greater range of events and be 
systematic in the application of criteria of inclusion, but there has not been much research 
exploring the strengths and weaknesses of using open source data to conduct terrorism 
research. This study begins to address this gap. 
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Open Source Terrorism Databases 


Scholars have used open sources to identify terrorism incidents and attributes of these 
events to systematically create event-level databases of terrorism. Technological advances, 
and especially the development of the Internet, provide the opportunity to access large 
numbers of source documents efficiently. Silke’ (2001) found that 80% of the published 
work in this area is “based either solely or primary on data gathered from books, journals, 
the media or other published [open access] documents” (see also Merari 1991; Horgan 
2008; Silke 2001). LaFree and Dugan (2004: 63), referring to open source data, conclude: 
“A research source that is rarely used in criminology research is a mainstay of research on 
terrorism.” 

Researchers have begun to combine these open sources into terrorist event databases 
that include increasingly more precise definitions, coding rules and documentation. LaFree 
and Dugan (2007) created the Global Terrorism Database (GTD) from the Pinkerton 
Global Intelligence Services (PGIS) data that identified and coded all terrorism incidents 
from wire services, U.S. State Department reports, other U.S. and foreign government 
reporting, U.S. and foreign newspapers, information from PGIS offices, and data furnished 
by PGIS clients. Similarly, many scholars have used the International Terrorism: Attributes 
of Terrorist Events (ITERATE) data set that uses information from media accounts to 
record attributes of transnational terrorist incidents (Endler and Sandler 2006). Other 
examples include Hewitt’s (2003, 2005) analysis of domestic terrorism that combined 
information from multiple sources, such as the Trick chronology (1976) for events from 
1965 to 1976, the Annual of Power and Conflict (1976-81), the FBI’s annual reports, as 
well as information from watch groups and journalists. Ross’s (1992) study of right-wing 
violence in Canada assembled a detailed chronology of events based upon material from 
the Toronto Reference Library, archival newspaper clippings from the intelligence branch 
of a police agency, files of three private organizations, published chronologies of violent 
political behavior in Canada, and newspaper clippings from major magazines. 

Other researchers have used open sources to create terrorist suspect-level databases. 
Handler (1990; see also Smith 1994) collected data from newspaper clippings of nearly 
400 known terrorists from the 1960s and 1970s and concluded that the demographic 
profiles of Right and Left terrorists are significantly different. Leiken and Brooke (2006) 
collected media reports, court and government documents and reports from non-govern- 
ment sources to develop a database that included biographical data on 373 jihadi terrorists. 
Sageman (2004) used publicly available documents for a network analysis to study 172 
persons who joined the global Salafi jihad, and Bakker (2006) replicated this study and 
approach with 250 jihadists in Europe. 

The use of open source data has also become an important source of information for the 
law enforcement community (see e.g.,Silber and Bhatt 2006). Freilich et al. (2009a, b) 
study that surveyed representatives responsible for homeland security investigations within 
state police agencies asked respondents to rank the importance of various sources of 
terrorism information, such as open (e.g., Books & journals; Internet; media) and non-open 
sources (e.g., Informants; Joint Terrorism Task Forces). They found the most frequently 
used source of terrorism information for state police agencies was the Internet (respondents 
used electronic search engines). Other open sources that were identified as being partic- 
ularly helpful were the media and radical publications. 

In sum, scholars have responded innovatively to overcome some of the weaknesses of 
traditional sources of data on crime for the study of terrorism by constructing databases of 
materials collected from open sources. Many published studies use these sources and thus 
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any discovered limitations might undermine the robustness of their findings. Although this 
growth has helped advance the field, there is little research that assesses the error structure 
of such databases. This is an oversight that we attempt to remedy in this article by 
examining how well a number of terrorism sources do in capturing homicides committed 
by far-right extremists in the United States. 


Setting Forth a Framework for Describing Databases’ Sources of Error 


Before we describe the error structure for terrorism databases we must develop a 
framework for describing sources of error. In sample surveys, for example, we have come 
to distinguish sampling error from measurement or non-sampling error. The former cat- 
egory has been divided further into error stemming from coverage as opposed to sample 
size. Measurement errors are distinguished by stages in the data collection process, e.g. 
response errors, coding error, etc., and the same must be done for collecting data from 
open sources. 

There are several steps involved in putting together an open source database that can 
serve as a rudimentary framework for distinguishing sources of error. First, open sources 
must be identified and mined to uncover potential incidents, or groups, or terrorists, or 
countermeasures (or whatever the focus of the research initiative). These efforts attempt to 
identify cases that are consistent with a specific set of inclusion criteria. The identification 
of events will be affected by the sources searched, the search engines used, the key words 
used to search and the manner in which the search terms are entered. Once a listing of 
potentially appropriate incidents has been identified, the second step is to correctly apply 
the source’s inclusion criteria to ensure that events that meet the criteria are included, while 
events that fail to meet the criteria are excluded. The third step is to gather materials that 
provide additional information on the event that can be used to confirm that this is a 
terrorist event and to describe the event more fully. This search will also be affected by the 
amount of information one has on the event from the initial identification step, as well as 
many of the same factors that influence the initial search. These materials can be examined 
electronically or with human coders to collect variable-related information. The final step 
(prior to analysis) is to arrange these materials so that it is coded consistently and are useful 
for analysis. 

We examine the bias that can result from the reliance on specific sources when con- 
structing an event-level database using open source material. We do this by examining the 
process by which the Extremist Crime Database (ECDB) was constructed. In constructing 
this database, we used a number of sources of open source data to identify the universe of 
right wing homicide events. 

Again, this study only includes homicide attacks committed by far-right extremists in 
the United States and ignores non-fatal events and acts that occur outside the United States. 
We limit the analysis in this way for two reasons. First, and importantly, homicide attacks 
in the United States are more likely to be covered by media outlets and open sources than 
other types of attacks (Chermak and Gruenewald 2006). We seldom see a headline, for 
example, that trumpets a “Small Earthquake in Chile, Not Many Dead.” Further, aca- 
demics and reporters in the U.S. are greatly interested in far-right extremists, like anti- 
government Patriots or racist white supremacists, and homicides committed by them are 
likely to be picked up by the media (see e.g., Aho 1990; Chermak 2002; Dobratz and 
Shanks-Meile 1997; Dyer 1997; Coates 1995; Freilich 2003; Freilich et al. 1999; Hamm 
1993, 1997, 2002). Similarly, fatal events in the United States are more likely to be picked 
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up by open sources than incidents occurring in more remote locations (e.g., Mongolia) or 
dictatorial countries (e.g., North Korea) that lack a free press. 

This investigation thus focuses on events that are likely to be included in the listings we 
examine. In other words, open source terrorism sources may miss more of the larger 
universe of terrorism events eligible for inclusion than what we examine here. Thus, any 
biases we uncover, arguably, could be more pronounced in the larger universe (that 
includes non-American and non fatal attacks) of terrorism events in these sources. Second, 
as discussed below, the far-right homicide data in the ECDB has been systematically 
reviewed twice and is the most reliable data in the ECDB. 

We assess each source’s contribution to identifying the ECDB universe of right wing 
homicide incidents. Following the logic of “catchment-re-catchment” sampling we assume 
that once we take account of definitional and scope differences, all the sources we review 
should tap the same universe of events. We also compare several attributes of victims, 
suspects, and incidents found in each individual data source to the pooled data. Based upon 
these results we make specific recommendations about strategies that could be used to 
mitigate the impact of any bias uncovered. 

The next section provides a description of the ECDB project. 


The Extremist Crime Database (ECDB) 


The United States Extremist Crime Database (ECDB) was created in 2006 and it has been 
supported by the Department of Homeland Security (DHS) both directly and through the 
National Consortium for the Study of Terrorism and Responses to Terrorism (START) 
and from other sources (Freilich and Chermak 2009a, b, 2011). There were several 
justifications for building the database. First, the domestic far-right poses a significant 
threat to public safety (Freilich et al. 2009a, b). Domestic terrorism attacks generally 
outnumber international ones 7—1 in the United States and the far-right is especially 
dangerous (LaFree et al. 2006; see also Hewitt 2003, 2005). Second, the criminal 
activities of the far-right are a neglected research topic. Most terrorism research focuses 
on international terrorism. Third, studies on the domestic far-right’s crimes usually rely 
on anecdotal or case study data. The few empirical works restrict their examinations to 
“terrorist” acts prosecuted on the federal level. For example, a literature review of over 
300 studies examining far-right extremism concludes that less than a third used empirical 
data to produce findings (Gruenewald et al. 2009; see also Coates 1995; Langer 2003; 
Neiwert 1999). 

The ECDB includes data on the suspects, victims and targets, event, and group char- 
acteristics of violent crimes committed by supporters of the domestic far-right from 
1990-2008.' The database includes both ideological crimes (terrorist and non-terrorist 
acts) and routine/non-ideological violent crimes. Including non-ideological violent acts 
regardless of jurisdiction (federal, state and non-tried cases are all included) is significant 
because prior research has only focused on far-right criminal activity on the federal level, 
and/or limited themselves to crimes that fall under a particular definition of terrorism 
(Hewitt 2003, 2005; Smith 1994), or only analyzed the far-right’s involvement in specific 
crimes. 


' The ECDB has received funding to expand its focus to include far-left and Al Qaeda inspired violent 
criminal activities and financial crimes by far-rightists and Al Qaeda inspired offenders. Since the focus of 
the analysis is on the far-right homicides, these data will not be discussed. 
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Identifying Far-Right Homicides for Inclusion in the ECDB 


For an incident to be included in the ECDB two criteria must be satisfied. First, behav- 
iorally a violent act (for this study a homicide) must have been committed inside the 
United States between 1990 and 2008. Second, attitudinally at the time of the incident at 
least one of the suspects who committed this act must have subscribed to a far-right belief 
system.” The incident is included in the database only when both criteria are satisfied. The 
development of the ECDB occurred in multiple stages but our concern here is the iden- 
tification of far-right homicides through various sources. The goal is to understand what 
each individual source added to the database and whether each source had a consistently 
increasing number of events that were identified in any of the previous data sources. This 
would indicate that we have come closer to capturing the universe of eligible events. 

Our attempt to systematically identify every far-right homicide that was listed in an 
open source involved reviewing each of our sources twice. First, as we began building the 
ECDB (2006-2008) we reviewed (1) existing terrorism databases such the American 
Terrorist Study (ATS), and the GTD; (2) official sources such as the FBI’s Terrorism in the 
United States annual reports; (3) scholarly and journalistic accounts; (4) materials pro- 
duced by watch-groups such as the Anti-Defamation League (ADL) and the Southern 
Poverty Law Center (SPLC), and (5) we also conducted media searches to uncover cases 
(see below). Second, in the summer of 2010 we reviewed each source a second time to 
validate each included incident and to double-check that we did not miss any incident that 
should have been included. 


Comparing Data Sources of Far-Right Homicides 


The focus of our analysis is comparing the homicides captured from 10 of the sources* 
used by the ECDB to identify homicides committed by far-rightists in the United States 
between 1990 and 2008.* These 10 sources include three noted scholarly databases or 
academic listings (Global Terrorism Database; American Terrorism Study; Hewitt’s 
chronology), two law enforcement and official sources (F.B.I; the State and Local Anti- 
Terrorism Training listings), two major watch-group organizations (Anti-Defamation 
League; Southern Poverty Law Center), and systematic media searches through lexis-nexis 
web-engine; the Ross Institute Internet Archives for the Study of Destructive Cults, 


? This study operationalizes the far-right as individuals or groups that subscribe to aspects of the following 
ideals: They are fiercely nationalistic (as opposed to universal and international in orientation), anti-global, 
suspicious of centralized federal authority, reverent of individual liberty (especially their right to own guns, 
be free of taxes), believe in conspiracy theories that involve a grave threat to national sovereignty and/or 
personal liberty and a belief that one’s personal and/or national “way of life” is under attack and is either 
already lost or that the threat is imminent (sometimes such beliefs are amorphous and vague, but for some 
the threat is from a specific ethnic, racial, or religious group), and a belief in the need to be prepared for an 
attack either by participating in or supporting the need for paramilitary preparations and training or sur- 
vivalism. Importantly, the mainstream conservative movement and the mainstream Christian right are not 
included. 

3 We attempted to review an 11th source, the Bureau of Justice Assistance (BJA) and Department of 
Defense (DOD) funded Institute for the Study of Violent Groups (ISVG) that tracks crimes committed by 
political extremists since 2002. Our requests for a listing of far-right homicides committed in the U.S. went 
unanswered. 

4 We excluded the National Counterterrorism Center’s Worldwide Incidents Tracking System (WITS) 
database that includes terrorist acts committed since 2005 in our analysis because it only contained one far- 
right related homicide event. This homicide was included in the other sources we examined. 
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Controversial Groups and Movements (identified as the Rick Ross’s Internet archive 
below), as well as incidents uncovered by ECDB coders as they reviewed information 
generated from the previous 9 sources. 

Table 1 lists each source, its published criteria and describes how we searched it. Our 
review of each source consisted of reading through its listings or narratives of attacks 
committed by political extremists in the U.S to flag all incidents that satisfied our inclusion 
criteria. 

A review of these sources found that they diverged by design on five-inclusion criteria 
that included (1) the time frame of included cases, (2) whether acts were restricted to 
ideologically (or politically) motivated homicides; i.e., whether routine/non-ideologically 
motivated acts were also included,” (3) whether acts were restricted to crimes that were not 
“hate-crimes,” i.e., whether bias-motivated crimes were included®: (4) whether the 
homicide must have been committed by a group, i.e., whether homicides committed by 
lone actors were also included,’ and (5) whether only incidents investigated federally were 
included, i.e., whether state-level cases were also included.* It is important to review these 
different criteria because they result in different “universes.” Thus, adjustments must be 
made when comparisons are made across these sources. 

Table 2 presents the 10 sources’ decisions on the criteria just outlined. 


> Six of the sources we used only included acts committed to further a “political” or “ideological” 
objective. These sources assume that ideologically motivated crimes are committed for a higher purpose and 
are different than routine crimes committed to further personal interests. The ADL, SPLC, the media and our 
coders focused however, on crimes committed by extremist suspects as opposed to events. Thus, these 
sources are interested in all homicides committed by far-rightists (both ideological and non-ideological). 
Our study assumes that the ECDB categorization of a homicide event as either ideologically motivated or 
non-ideologically motivated is accurate. In this sense, we are privileging the ECDB. For example, if the 
ECDB categorizes homicide X as ideologically motivated, while the FBI or GTD categorize it as non- 
ideological we code that event as ideologically motivated. While we appreciate critiques that question why 
the ECDB is treated as the ground truth, we make this determination for a few reasons. Again, the ECDB’s 
ongoing data collection efforts focus almost exclusively on homicides committed by far-rightists and other 
extremists in the United States. Further, the ECDB systematically searched through a series of open sources 
to identify these events twice. Conversely, most other data collection efforts focus on a much larger 
geographic universe and have a much larger N. 


© The FBI and databases like the GTD argue that while hate/bias-motivated crimes are related to terrorism, 
they are a separate phenomenon. Hate crimes are counted separately (e.g., the FBI’s UCR hate crimes report 
is distinct from the government’s annual terrorism reports). Non-hate crime ideologically motivated acts are 
thought to implicate broader political objectives that qualify as terrorist, while hate crimes do not qualify. 
Other sources disagree and sometimes conclude that ideologically motivated acts- anti- government or anti- 
minority- qualify and should be labeled terrorist. Here too we assume that the ECDB categorization of a 
homicide event as either a bias-motivated crime or not is accurate. 


’ The FBI and sources that rely upon its definition (e.g., the ATS and Hewitt) conclude that acts committed 
by lone wolves usually do not qualify as terrorists. Groups like the IRA or Al Qaeda, 1998- 2001, are 
organized entities that engage in ongoing criminal activities designed to harm American interests. Con- 
versely, lone actors usually lack the logistical support and infrastructure to conduct repeated attacks and to 
remain a longstanding threat to government interests. Thus, lone wolves do not implicate the same threat 
level and do not qualify as terrorist. Other sources conclude that ideologically motivated acts, regardless of 
the organizational level of the suspects who commit it, should be labeled terrorist. 


8 The F.B.L is charged with investigating domestic terrorism incidents and these incidents are subsequently 
prosecuted on the federal-level. Some sources again conclude that the jurisdiction of the prosecution is 
irrelevant and that what matters is whether the act is ideologically motivated. 
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The FBI, the ATS and Hewitt’s (2005) chronology claimed to rely on either the FBI’s 
terrorism definition” or policies.'° These three sources should include the fewest number of 
incidents because the FBI’s definition and policies are the most restrictive and only include 
(1) ideologically motivated acts (2) that were not classified as bias-motivated crimes, (3) 
were committed by a group, and (4) were prosecuted on the federal level. Further, the FBI 
and Hewitt did not include cases for the entire period. The FBI included cases from 1990 to 
2005 and Hewitt only went from 1990 to June 2004. 

Three sources, the GTD, the State and Local Anti-Terrorism Training (SLATT) source 
and Rick Ross’s website—have broader criteria. These sources included lone wolves and 
incidents prosecuted on the state-level. However, the GTD and SLATT should have fewer 
incidents than Rick Ross. The GTD excludes bias-motivated crimes that are spontaneous 
ideologically motivated attacks against racial and religious minorities, homeless and gay 
individuals (Personal Communication with GTD, 10/26/10). SLATT only included cases 
from 1997-2008 which should limit the total number of cases it included. 

Two sources—the ADL and the SPLC—should include the greatest number of cases 
because they included incidents for the entire 1990-2008 time period and their criteria 
were the most expansive. Both watch-groups included ideologically motivated and non- 
ideological homicides, bias-motivated homicides, as well as homicides committed by lone 
actors and/or that were prosecuted on the state-level. We applied these broad criteria to the 
systematic media searches we conducted to identify cases (see Table | for the search 
engines we searched and for the key words used). Finally, as our coders conducted open 
source searches to uncover information to fill in ECDB values they came across additional 
far-right homicides that no other source had identified (N = 16). Here too we applied the 
broadest criteria and included bias-motivated homicides and ideological and non-ideo- 
logical homicides committed by far-rightists. We also included far-right homicides com- 
mitted by lone wolves and/or that were prosecuted on the state-level. 

We selected these 10 sources from the ECDB to compare because they are widely 
known and/or have had their data used by scholars investigating terrorism and extremist 
activity in the United States. We identified 68 publications that used data from these 
sources (published by June 2010). Unsurprisingly, 25 studies have been published using 
GTD data. There have been at least 13 peer-review articles using SPLC data, and 7 studies 
using ADL data. Finally, there have been at least 13 studies using ATS data, 4 studies using 
Hewitt data, 3 studies using media data, and 3 studies using FBI data. 


Findings 


The ECDB has identified 329 far-right homicides that occurred between 1990 and 2008. 
Again, the ECDB includes both ideological and non-ideological homicides. For example, if 
a skinhead murders an African American as part of an initiation it would be defined as an 
ideological homicide, but if he kills a friend because he dented his truck it would be non- 
ideological. Interestingly, the majority of homicides, nearly 61 percent, were non- 


° The FBI defines terrorism as “the unlawful use of force or violence against persons or property to 
intimidate or coerce a government, the civilian population, or any segment thereof, in furtherance of 
political or social objectives” (FBI, 1997). 


10 While Hewitt claimed to rely on the F.B.1s definition, his precise methodology and validation scheme are 
unclear. He devotes only four pages, in an appendix, to his data sources and coding procedures. Further, as 
our analyses demonstrate Hewitt’s chronology includes lone actor, state-level and other attacks that should 
have been excluded under a strict application of the FBI’s guidelines. 
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ideological (and thus should not be in the ATS, FBI, GTD, Hewitt or SLATT) and 39% 
(N = 128) were ideologically-motivated. Another factor is whether the suspect was 
affiliated with a group or committed the homicide as a lone actor. Approximately 37% of 
the homicides were committed by lone actors (again these homicides should not be in the 
three sources that follow the FBI’s policies). Although many terrorism studies focus on 
offenses charged at the federal level, our results find that only 4% of the homicides were 
charged in federal court, nearly 82% in state court, and over 4% of the homicides were 
charged in both state and federal jurisdiction. Nine percent of the cases did not go to trial 
(the suspect either was killed or committed suicide) and we were unable to determine 
jurisdiction in 1% of the cases. Finally, over 5% of the homicides were committed in 
prison. 

Table 3 presents the percent of the final set of ECDB far-right homicides found in each 
source. This table demonstrates how different source criteria impact the number and type 
of cases that would be included in a source. In the first data column, we present the results 
for all homicides, controlling for year. For example, if the time frame for a source covered 
all years (1990-2008), then we used 329 homicides as the denominator. However, when a 
source covered a shorter time period, we eliminated those homicides that occurred outside 
its time frame.'! 

Over 55% of the homicides were extracted from the SPLC data and over 71% of the 
homicides were in the ADL materials. These two watch-group organizations should 
include the highest percentage of incidents since they have the broadest inclusion criteria. 
Further, the SPLC provides chronologies of incidents in its intelligence reports and pro- 
duces numerous publications documenting incidents. Similarly, the ADL discusses inci- 
dents in its publications, on its website, and through its connection to the militia watchdog 
website which documents incidents in its Calendar of Conspiracy and through forwarded 
accounts on its listserv. SLATT’s website compiles information from various documents 
and media sources that it then presents in summary form. Over 33% of the homicide 
incidents that occurred between 1997 and 2008 were extracted from SLATT. 

Fewer homicides were extracted from the other sources. Approximately 17% of the 
homicides were extracted from the Ross Institute for the Study of Destructive Cults, 
Controversial Groups, and Movements website, 16% were found in the media using 
general search strategies, and 4.6% from the FBI’s Terrorism in the United States reports. 
Over 16% of the homicides were extracted from Hewitt’s (2005) Chronology, a listing of 
incidents compiled from a variety of sources. Five percent of the homicides were extracted 
from the GTD and 2% from the ATS. 

Importantly, we should not expect most of these sources to have all or even most of the 
329 homicides because of their inclusion criteria. For example, the focus of the ATS 
database and the FBI is on terrorism incidents that were prosecuted in federal courts, and 
there are only 26 federal cases in the ECDB database. Unsurprisingly, our results dem- 
onstrate that the sources used to conduct terrorism research will provide a restricted or 
expanded view depending on the criteria used for inclusion. Also significant is that our 
coders added 16 cases to the database by identifying cases from the search file materials. 


'! There was the possibility that a homicide that occurred outside the time frame would be included in a 
source. SLATT reports legal decisions as these homicides progress through the criminal justice system. For 
example, it reported on the numerous appeals and court decisions related to the Oklahoma City bombing. 
We treated these cases as being included in the source even though they fell outside the time frame. In 
addition, Hewitt’s data collection ended 6/30/2004. Thus, we compare this source to the only the 1990-2004 
homicides. Finally, the GTD does not include any incidents from 1993 thus we only compared this source 
for 1990-1992 and 1994-2008. 
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These cases were not included in any other source examined here, but were discovered as 
the coder read the materials about an observed case. 

The second and third data columns include findings on how these sources include 
ideological and non-ideological offenses. Over 81% of the ideological homicides were 
extracted from the SPLC, approximately 67% from the ADL, and 63% from SLATT. Most 
of the incidents extracted from Hewitt (2005) were ideologically-motivated homicides, 
nearly 32% were extracted. Over 21% of the ideological homicides were extracted from 
general media searches and nearly 23% from the Rick Ross internet archive. The FBI had 
10%, the GTD had about 10%, and the ATS had about 4% of the ideological homicides. 

Most of the sources, but especially the law enforcement sources, criteria indicate that 
we should not expect them to include non-ideological homicides. The results support this 
hypothesis. The FBI did not report any non-ideological homicides, and SLATT includes 
19% of all non-ideological homicides. In contrast, SLATT includes over 63% percent of 
the ideological homicides that occurred between 1997 and 2008. 

Similarly, the inclusion criteria guiding the GTD and ATS should result in most, if not 
all, non-ideological homicides being excluded. Only 2.6% of the non-ideological homi- 
cides were in the GTD and only 1% in the ATS. Interestingly, the percentage of non- 
ideological homicides was lower for every source except the ADL when compared to the 
ideological homicide findings. Only 39% of the non-ideological homicides were extracted 
from SPLC data and reports, approximately 13% from the Rick Ross internet archive or 
media searches, and nearly 2% from Hewitt (2005). The ADL, however, reported on 67% 
of the ideological homicides and over 73% of the non-ideological homicides (such as a 
Skinhead killing his spouse during a non-ideological domestic violence incident). Over 5% 
of the non-ideological homicides were identified by our coders. 

The next columns in Table 3 include coverage of homicide incidents comparing lone 
actor to group results for ideological events. This is important to examine because three of 
our sources—FBI, Hewitt (2003, 2005) and the ATS—follow the FBI’s policy that usually 
excludes incidents where the suspect acted alone. This is a significant oversight because 
37% of incidents were committed by lone actors. Although most of these homicides were 
not motivated by ideology (61%), a large number were. Scholars and law enforcement 


Table 3 Far-right homicide coverage in sources 


Source All Ideological Non- Group Lone- Bias Non-Bias 
homicides homicides ideological Wolf 
homicides 
FBI (1990-2005) 4.6% 10.3% 0.0% 6.7% 14.6% 6.3% 11.0% 
Hewitt Chronology 16.3 31.6 1.6 33.3 26.8 43.8 30.0% 
(1990-6/30/2004) 
GTD (1990-2008) 5.4 10.1 2.6 10.8 6.8 10.5 10.1 
SPLC (1990-2008) 55.6 81.3 39.3 86.3 72.3 68.4 83.3 
ADL (1990-2008) 71.1 67.2 73.6 62.5 74.5 78.9 65.7 
ATS (1990-2008) 2.1 3:9 1.0 3.8 4.3 5.3 3.7 
Rick Ross Website 17.0 22.7 13.4 27.5 12.8 21.1 23.1 
(1990-2008) 
Media (1990-2008) 16.4 21.1 13.4 25.0 14.9 10.5 23.1 
SLATT (1997-2008) 33.3 63.0 19.1 69.2 55.9 27.3 69.4 
Coders 4.9 3.9 3D) 1.3 8.5 0.0 4.6 
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personnel have stressed the need to better understand the criminal activities of lone wolves 
(Chermak et al. 2010; Turchie and Puckett 2007). It is not surprising that most sources 
included a lower percentage of homicides involving lone wolves compared to those 
homicides where the suspect was connected to a group. For example, 15% of the lone wolf 
homicides were extracted from media searches compared to 25% of the group homicides. 
Similarly, 13% of the lone wolf homicides but 28% of the group homicides were extracted 
from the Rick Ross internet archive. Although the SPLC included information about a 
higher percentage of homicides involving groups (compared to lone wolf homicides), the 
ADL included information about a somewhat lower percentage. Seventy-five percent of the 
lone wolf homicides were extracted from the ADL, but 63% of the group homicides were 
located within their materials. Interestingly, the FBI provided a higher number of lone wolf 
homicides even though such incidents are specifically excluded by their inclusion criteria. 

The final two columns include whether an ideological incident was a bias-motivated 
crime or not. These two columns were included because some of the sources (FBI, ATS and 
GTD) criteria exclude (or like the GTD generally exclude) bias-motivated crimes, although 
the sources actually included some of these incidents. For example, the FBI included 6.3%, 
the GTD 10.5% and the ATS 5.3% of the bias-motivated crimes. SLATT included 70%, 
Hewitt 30%, the GTD 10.1%, the FBI 11%, Rick Ross 23.1%, media searches 23.1%, the 
ATS 3.7%, and the ADL 65.7% of the non bias-motivated homicides. Surprisingly, the 
SPLC (68.4/83.3) includes a somewhat higher percent of non bias-motivated crimes. 

Table 4 presents the percent distribution of the number of sources per homicide. It also 
examines the number of sources for ideological and non-ideological and lone wolf and 
group homicides. On average, these homicides were mentioned in one or two sources.!* 
Forty-seven percent of the homicides were mentioned in only one source and 24% were 
mentioned in two sources, highlighting the importance of extracting information from as 
many sources as possible. Few homicides were mentioned in eight or nine of the sources. A 
comparison of ideological to non-ideological results indicates that ideological homicides 
were more likely to be mentioned in multiple sources. Over 31% of the ideological 
homicides, but 54% of the non-ideological homicides were cited in only one source. 
Moreover, over 78% of the non-ideological homicides were in only one or two sources. 
This was expected because four sources specifically exclude non-ideological homicides. In 
contrast, 20% of the ideological homicides were in three sources, 9% were in four sources, 
7% in five sources, and 7% in eight sources. The number of sources that included lone wolf 
and group homicides are similar. Over 38% of the lone wolf homicides were in only one 
source and 19% were in two sources, but several lone wolf incidents were presented in 
eight sources. Almost 70% of the group homicides were in one or two sources. 


Catchment and Re-Catchment Analysis’* 


Applying the logic of “catchment-re-catchment” sampling, Table 5 takes into account 
definitional and scope differences across our sources. This table, in other words, examines 


' Tt is likely that a specific source relied on several of the other sources examined here to identify incidents. 
That is, these sources rely on other open sources, and thus the number of sources might be indicative of a 
single source that captured an incident, that was then used by the other sources. 


'3 The catchment/recatchment methodology is similar to the capture and recapture methodology. Impor- 
tantly though, we use the catchment/recatchment methodology here more as an analogy than a specific 
method largely because the harvesting of open source data is not sampling in the traditional sense 
and terrorism is not very prevalent. For these reasons it is not necessary to do formal estimation as is done 
in more traditional catchment/recatchment applications. 


va Springer 


J Quant Criminol (2012) 28:191-218 209 


Table 4 Number of sources by type of homicide 


Number All homicides Ideological Non-ideological Group Lone-Wolf 
of (mean = 2.09) homicides homicides (mean = 2.22) (mean = 1.98) 
Sources (mean = 2.83) (mean = 1.61) 

1 45.3% 31.3% 54.2% 27.5% 38.3% 

2 23.7 22.7 24.4 25.0 19.1 

3 14.9 19.5, 11.9 20.0 19.1 

4 4.9 9.4 2.0 8.8 10.6 

5 3.6 7.0 1.5 8.8 4.3 

6 1.2 2.3 Fi) 25 0.0 

7 0.0 0.0 0.0 0.0 0.0 

8 3.0 7.0 Fe) 6.3 8.5 

9 3 8 0.0 1.3 0.0 

10 0.0 


. 0.0 0.0 0.0 0.0 


Table 5 How sources capture similar homicides (events not prosecuted as hate crimes; N = 59) 


GTD Media Hewitt RickRoss SLATT SPLC ADL 

(N = 5) (N = 10) (N = 11) (N = 17) (N = 42) (N = 46) (N = 50) 
GTD 5 new 4/5 4/5 5/5 5/5 5/5 5/5 
Media 6 new 6/10 8/10 10/10 10/10 10/10 
Hewitt 5 new 6/11 11/11 11/11 11/11 
Rick 7 new 15/17 17/17 15/17 

Ross 

SLATT 21 new 34/42 39/42 
SPLC 10 new 40/46 
ADL 3 new 


Two incidents were not included in any of the sources, but were identified by our coders during the search 
process 


the overlap among data sources by only focusing on incidents that meet the inclusion 
criteria (are eligible) for all the listed sources.'* Thus, this analysis only includes the 59 
ideologically-motivated incidents that were not prosecuted as bias-motivated crimes. 
Limiting analysis in this way should result in all these sources encompassing the same 
universe of events. Thus, the inclusion of additional sources should result in a steadily 
decreasing number of events not identified by previous sources and a consistently 
increasing number of events that were identified in previous sources. Such a finding would 
indicate that we are approaching capturing the universe of eligible events. Conversely, if 
these data sources are not tapping the same underlying universe of terrorism incidents, we 
should see little change in the number of new events identified with the addition of new 
sources and a low or stable number of events captured by at least one of the prior data 
sources. 


'4 We have excluded the FBI and ATS sources from these analyses because they focus only on federal 
cases. 
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We started in column one with the source (i.e., the GTD) that included the fewest 
incidents and in each subsequent column examined how many of these incidents were 
captured by an additional source and how many new incidents this additional source added. 

The media had 4/5 of the events captured from the GTD, but also added 6 new inci- 
dents. Hewitt’s data included 4/5 GTD incidents and 6/10 media incidents, and added 5 
new incidents. Rick Ross included all of the GTD events, 8/10 media incidents, and 6/11 
Hewitt incidents. It also added 7 new incidents. SLATT has significantly more events, 
including all of the GTD, media, and Hewitt incidents and most of the Rick Ross’s 
incidents. It also added 21 new incidents. The SPLC had all GTD, media, Hewitt, and Rick 
Ross incidents, and 34/42 SLATT incidents. This source added 10 new incidents. Simi- 
larly, the ADL had all of the GTD, media, and Hewitt incidents, and most of the Rick Ross, 
SLATT and SPLC incidents. The ADL also included three incidents that were not in any 
other of the sources. 

This analysis demonstrates that consistently adding sources to a data collection effort 
substantially increases the number of events captured. There appears to be a convergence 
as ultimately fewer new events were found as the final sources are added. This is especially 
the case when specialized sources like SPLC and ADL are used. Indeed, our coders 
identified only two incidents (less than 3% of all events) that did not appear in any other 
source. This suggests that these data sources are coming closer to capturing the universe of 
events. 


Potential Biases Caused by Open Source Data Collection and Source Selection 


This section examines potential biases that might result from relying on open source data 
collection generally and specific terrorism data sources. Like all data collection procedures, 
open source searching has strengths and weaknesses. This section explores two types of 
possible bias, (1) publicity effects, and (2) source effects. The publicity effects analysis 
focuses on whether sources are susceptible to the inclusion of outlier events, the most 
extreme, celebrated cases. The source effects analysis investigates if the events captured in 
a particular source are different from the overall universe and thus could lead to different 
conclusions. Although there is some variation by type of event, researchers have relied 
mostly on single sources to study domestic terrorism (Hewitt 2003, 2005; Smith 1994). 
This raises the question whether the choice to use one specific data source has a substantial 
effect on the type of incidents identified and the results obtained in quantitative analysis. 
This possibility is investigated by comparing several attributes of victims, suspects, and 
incidents in each specific source to all sources. 


Publicity Effects 


Relying on open sources generally, and a single source specifically, may uncover data 
susceptible to publicity effects. There is an unequal distribution of information about such 
events and thus the highest profile events are most likely to be included within a database. 
Chermak and Gruenewald (2006: 443) study of media coverage of domestic terrorist 
incidents concluded that only 55% of a sample of 412 incidents received any coverage in 
the New York Times, 15 incidents accounted for 80% of the total number of articles and 
85% of the words written about all incidents. Such high profile incidents are valuable to 
researchers because they produce multiple data points of interest and competing sources to 
better determine the veracity of information about the incident. For example, the number of 
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source documents, including media reports, government documents, and books about the 
Oklahoma City bombing provide assurance that the researcher has multiple perspectives to 
consider when determining facts. One problem that relates to the concerns of this article is 
how such outlier cases might impact general and scholarly understanding about a particular 
problem. We investigate this issue with a couple of simple analyses. 

There were ten incidents that were included in at least eight of the sources. Most of 
these cases have dominated discussions about the nature of the far-right threat in the United 
States over the last 20 years. These cases include the Oklahoma City bombing that resulted 
in the deaths of 168 victims. Several of the other cases also included multiple victims, 
including two bombings orchestrated by Eric Rudolph, the shooting sprees of Ben Smith 
and Buford Furrow, and the Gunn and Kopp abortion doctor killings. In addition, the 
murder of a US Marshall at Ruby Ridge and two homicides of gay men were included in 
most of the sources examined here. 

The potential bias produced by these high profile events is problematic for sources that 
only include few incidents. These 10 incidents represent only 6% of the cases identified in 
the SPLC data and 5% in the ADL data. They represent 12% of the SLATT cases, 20% of 
the media, 18% of Rick Ross, 26% of Hewitt’s, 50% of the GTD and 73% of the FBI data. 

We also examined whether high profile cases might increase the general interest in far- 
right criminal activity. Research has described this phenomenon as an “echo effect:” A 
single high profile cases causes increased awareness about an issue and organizational 
responses that include processing similarly situated cases differently because of the 
heightened publicity (Surette 1999; see also Damphousse and Shields 2007). We explore 
the potential impact of a high profile case by examining how the number of sources might 
change. Using the Oklahoma City bombing (which occurred on April 19, 1995) time frame 
as the focus, we grouped the data into roughly equal periods, and then examined the results 
comparing 1990-1994, 1995-1999, 2000-2004, and 2005-2008. Previous research indi- 
cates that public and media attention to domestic extremism grew dramatically following 
the bombing and interest continued to the end of the decade before it waned (Chermak 
2002). 

The results indicate that interest in domestic extremism following the Oklahoma City 
bombing may have impacted source coverage of far-right violence incidents. On average, 
there were 1.9 sources for homicides that occurred between 1990 and 1994, 1.94 sources 
for homicides that occurred between 2000 and 2004, and 1.7 sources for homicides that 
occurred between 2005 and 2008. The number of sources for homicides that occurred 
during the 1995-1999 time period increased to 2.78 on average. We did the same calcu- 
lations for only the ideological homicides and the results are similar. The number of 
sources covering far-right homicides increased following the Oklahoma City bombing. 
Specifically, 2.3 sources in 1990-1994, 2.56 sources in 2000-2004, and 2.28 sources in 
2005-2008 covered ideological homicides. During the 1995-1999 time period however, 
3.54 sources covered ideological homicides. 


Source Effects 


It is clear from the foregoing discussion that some open source terrorism data collection 
efforts include more right-wing homicide events than others, even after adjustments are 
made for different inclusion criteria. Importantly though, this does not necessarily mean 
that the analyses of data from these different sources would lead to different conclusions. 
The GTD may include fewer cases of right wing violence than SPLC, but the smaller group 
of cases in GTD may be just as representative of the population of right-wing homicides as 
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the SPLC universe. In this case, analyzing the data from any of the two sources may 
produce similar results supporting similar conclusions. 

To test whether different data sources capture different types of right-wing homicide 
events, we examined the distributions of case characteristics after combining the most 
commonly used data sources into general categories (databases (GTD/Hewitt); internet 
(media/Rick Ross), law enforcement (SLATT) and watch-group (SPLC/ADL). If the 
different data sources are capturing the same types of events, then the uni-variate distri- 
butions of case attributes should not differ across sources. This analysis refines our 
understanding of how the inclusion or exclusion of cases impacts selectivity. Distinctions 
between classes of data sources should be tied to differences in the social organization of 
data collection. A watch-group, for example, should probably list more cases where the 
victim has the ascriptive characteristics of the groups for which the watchdogs are advo- 
cating and it should be more assiduous in recording information on the presence or absence 
of that ascriptive characteristic. For example, law enforcement sources may be less con- 
cerned about religious affiliation of the victim compared to watch-group sources. Law 
enforcement sources, however, may have more events with guns because whether someone 
is armed is important to them. 

Importantly, several of the data sources do not have values for these variables. Any 
researcher wanting to examine these variables would have to (1) use these data sets as a 
start to generate a listing of cases in which they are interested; (2) once this list is generated 
the scholar would have to do additional work to uncover values for the missing variables of 
interest. One way to do this would be to systematically search each of these events through 
all available open sources. The ECDB search protocol is in fact just a refined, compre- 
hensive search protocol. Most of the sources examined here only had the chance to have 
the case, and did not include much information about characteristics of the victim, 
defendant, or incident. Since we are using only one source of data to determine the value of 
a characteristic of an event, differences in the recording the characteristics of an event will 
not contribute to differences in the characteristics across data sources. Any differences in 
characteristics of events observed across data sources will be due to selectivity in the 
inclusion of events. Characteristics of terrorist events will differ across data sources 
because one source includes a different set of events than the other. We would likely find 
more differences across sources if we allowed recording of the value of the characteristics 
to vary as well. Here we are only testing the effects of selectivity and not differences in 
recording. The conservatism of this test in this regard combined with the small number of 
cases used for this analysis (N = 59) would argue for using alpha level of .1 in the test for 
selectivity. 

We compared three characteristics of victim (gender, age, and if the victim was African 
American), two characteristics of suspects (age and if they had a prior criminal record 
before committing a homicide),'> and two characteristics of the incident (number of sus- 
pects and whether a gun was used in the incident) by each of the combined category of 
sources (GTD/Hewitt v. Media/Rick Ross; GTD/Hewitt v. SLATT; GTD/Hewitt v. Watch- 
group Sources; Media/Rick Ross v. SLATT, etc.). Table 6 presents the variables that were 
statistically different. 

There are several results worth noting. First, there were only a handful of statistically 
different results. Indeed, there were no differences comparing sources for the number of 
suspects, victim race, victim age, and victim gender variables. Second, most of the dif- 
ferences uncovered were for the sources with the fewest number of cases. Seven of the nine 


'S All of the suspects for these incidents were male and white. 
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statistical different tests occurred when comparing the databases to the other sources. 
Third, the majority of differences were for just two variables. Suspect priors and whether a 
gun was used were the variables that were most likely to be different when comparing 
sources. 


Discussion 


Scholars have become increasingly interested in the study of terrorism. This interest makes 
sense considering the social and political impact of the September 11th attacks. Scholars 
have begun to conduct research that is designed to produce findings that could be useful to 
policy makers and practitioners engaged in decision-making processes. One obstacle to a 
rigorous study of terrorism has been the absence of readily available data for analysis. 
Despite definitional, conceptual, and access issues, the scholarly community has responded 
by building on existing databases or creating new ones, providing the opportunity for 
increasingly sophisticated statistical analysis. These databases and other sources of ter- 
rorism or extremism data rely on open source materials. Scholars have been successful 
analyzing such sources and publishing peer review materials. Other stakeholders, such as 
law enforcement intelligence analysts, rely on open source materials as well. One possible 
limitation of using such sources, however, is a lack of understanding of their strengths and 
weaknesses. This paper attempted to take a step back to assess selectivity issues related to 
how well 10 open sources do in capturing homicide events committed by far-right 
extremists in the U.S. Several of these sources are the leading databases on terrorism and 
others like the SPLC and ADL are often used by researchers. These latter sources are 
frequently cited in news stories and are used to inform policy decisions (Chermak 2002). 

We identified significant variation in the percentage of homicides that were included in 
the different sources. We found that in the time period following the most celebrated and 
deadly far-right homicides that occurred between 1990 and 2008, the number of sources 
covering such homicides increased and a greater percentage of open sources included other 
far-right incidents compared to other time periods. The variation in the number of incidents 
in each source is partially a function of the differences in their inclusion criteria. Sources 
differ as to whether they include far-right homicides that are non-ideological, bias-moti- 
vated crimes, committed by lone wolves, prosecuted on the state-level and the time frame 
on which they focused. It is thus crucial that scholars who use a specific database 
understand its inclusion criteria and appreciate what incident types are included or 
excluded and why. This will insure that researchers examine a universe of events that 
accurately represent the research questions they are investigating. 

Interestingly, we discovered that sources were sometimes inconsistent with their own 
inclusion criteria. These inconsistencies operated in two ways: (1) sources included 


Table 6 Differences in attributes using combined sources (p-value) 


GTD/Hewitt Media/rick ross SLATT SPLC/ADL 
GTD/Hewitt Gun (.002); Gun (.039); Priors (.022); Gun (.002); 
Priors (.078) Suspect Age (.056) Priors (.006) 
Media/Rick Ross Gun (.091); Suspect Age (.029) 
SLATT 
SPLC/ADL 
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incidents that did not meet their criteria and therefore should have been excluded, and (2) 
sources excluded incidents that mer their criteria and thus should have included them. 

In terms of sources including incidents they should not have, most sources were less 
likely to include non-ideological crimes, those committed by lone wolves, or bias-moti- 
vated events. Although part of this variation can be attributed to the inclusion criteria of the 
source, there were homicides present in a source that was inconsistent with these criteria. 
For example, the ATS, FBI, GTD, Hewitt, and SLATT inclusion criteria specifically 
exclude non-ideological routine homicides from their universes. While the FBI operated 
consistently with its criteria and did not report any non-ideological homicides, and the 
ATS, Hewitt and the GTD were mostly consistent (only 1, 1.6, 2.5 of the non-ideological 
homicides, e.g. robbery and personal grievance, were in the ATS, Hewitt and the GTD, 
respectively), SLATT included 19% of all non-ideological homicides. Similarly, the FBI 
reported over 14% and Hewitt over 26% of lone wolf homicides even though their criteria 
exclude such attacks and they are only supposed to include incidents committed by groups. 
We did an analysis that excluded all cases that were either prosecuted as a hate crime or 
were motivated by race and found that several sources included such events even though 
their own criteria should have excluded them. 

Sources also excluded incidents they should have included. In general, the watch-group 
sources included most of the homicides, the law enforcement source included a majority of 
the ideological homicides, but media sources and the research databases included a smaller 
percentage of these homicides. Our results indicate that every source “missed” incidents 
that according to its own criteria should have been included. 

This provides evidence that by combining various sources one gets closer to including 
most of the universe of available far-right homicide cases. Indeed, there appears to be 
substantial value in constructing a database by combining the incidents from multiple 
sources. Nearly half of the homicide incidents appeared in only one source. Non-ideo- 
logical and lone wolf homicides were even more likely to be included in only one source. 
Our catchment and re-catchment analysis confirmed this as the addition of each new source 
increased the number of events captured. There was a convergence as fewer new events 
were found as new sources were continuously being added. Out of the cases used in the 
previous analysis (N = 59) our coders only uncovered two additional incidents (less than 
3% of all events) that did not appear in any other source. This suggests that these data 
sources are close to representing the true universe of known far-right homicide events. 

Despite the variations in coverage of incidents and an over exclusion of celebrated 
cases, the basic attributes of victims, suspects, and incidents tested here were generally 
presented similarly. But it is important to note that the sources with the fewest cases 
produced the most differences, and that differences occur for some types of variables 
(guns, suspect priors). This is an area of research that needs more development because we 
only examined a handful of characteristics, and we were not able to examine whether 
differences in recording of the characteristics of a specific attribute varied by type of 
source. 

In sum, this study provides the first preliminary empirical support for the notion that 
scholars using open-source data are using data that is representative of the larger universe 
they are interested in. In fact, it appears that selectivity bias may be less of a problem than 
initially feared. But, there is still much to learn about the strengths and weaknesses of open 
sources. It is important that researchers understand possible biases of their data because it 
is then possible to make necessary adjustments. It is imperative with large data collection 
efforts, e.g. the ECDB, ATS, or GTD, that the principal investigators understand, discuss, 
and attempt to correct systematic biases to their data and subsequently provide this 
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information to others. Future research could also extend this study by examining additional 
crime types and terrorism generally, as well as looking at additional ideologies (e.g., eco- 
terrorism and Jihadi events) at locations outside the United States. 

One way to improve data transparency for future users is to create an error profile. An 
error profile identifies possible sources within the data collection methodology that may 
bias the results through non-sampling errors. Such errors can negatively affect the data 
even if a census or random sample is used. This becomes increasingly important as datasets 
are made public to researchers and policymakers who were not involved in the initial 
research design and collection of data. Even though in some research articles the authors 
include a small section that discusses the limitations of the data and their findings, it is 
important that researchers analyzing secondary data understand and have access to detailed 
methodological information which principal investigators and research assistants accu- 
mulate during the data collection and analysis. 

In the late 1970s, a commission on non-sampling errors created an error profile to 
identify and correct possible problems associated with the survey data used by the Census 
Bureau. The commission recommended that future government data collection efforts do 
the same. Although most large data collection efforts in the field of terrorism are not based 
on survey data, an error profile could be created through the documentation of possible 
non-sampling errors that could be used to address possible bias created through non- 
sampling errors. It can also be used by researchers analyzing secondary data in informing 
discussions related to the limits of their study and possible policy implications. 

As the purpose of the profile is to make researchers aware of the known errors in the 
data set so they might change their analysis or their inferences from the analysis accord- 
ingly, the value of an error profile increases for dynamic data collection efforts such as 
open source terrorism databases, which are continually updated and refined. Few data 
collections in social science have been the subject of sufficient methodological research as 
to have an error profile, so researchers generally ignore potential errors or speculate about 
them without empirical evidence. To be sure, it is easier to identify and empirically 
investigate sources of error in static data sets, but few single use data sets attract enough 
attention to build much of an error profile. Due to this, data collection efforts that are not a 
single cross-section have changes in definitions and procedures that affect the accuracy and 
comparability of the resulting data (i.e. the UCR & the NCVS). Even when the procedures 
and definitions of the collection stay the same, society can change in ways that introduce 
error. If we find that data sources, or search engines or search terms affect the quality of the 
data, then changes in these aspects of the data collection should be documented. Then, if a 
database compiled with open source data adds a different data source or uses a different 
source engine or different search terms, these changes should be noted in the documen- 
tation and both database and documentation should be dated. Making researchers aware of 
error and identifying procedural differences in open source search that can affect the 
accuracy of the data is a significant step forward. 

For the ECDB specifically, an error profile might address the following two situations 
involving possible non-sampling errors: identification of incidents and coder decisions. As 
one of the ECDB’s objectives is to conduct a census of all homicides committed by 
individuals who subscribe to a far-right ideology, any missing homicides introduce error 
into the data and the subsequent analyses. True far-right homicide incidents may be missed 
for many different reasons, some of which include; a murder miscategorized by police, 
prosecutors, or medical examiners as a suicide or natural death, the far-right suspects were 
never identified as suspects in a homicide, suspects that were identified but were not tied to 
the far-right, information that did tie the suspect to the far-right was never reported to the 
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media, or the media never incorporated an individual’s far-right ideology into their articles, 
especially when the homicide was not ideologically motivated. 

In addition, when searching open-sources for cases, other forms of non-sampling error 
may occur. For example, SPLC or ADL publications may link a suspect to the far-right 
based on unsubstantiated data that, because of its lack of transparency, might cause over 
coverage of the population of interest, extending the sample outside of its intended frame. 
Ideally, this type of error would be corrected in the coding process when no additional 
information relating a suspect to the far-right can be found in follow up open source 
searching. In relation to media searches that are used to identify cases, the key terms and 
sources used can also introduce non-sampling error. The ECDB used Lexis-Nexis, which, 
although it is an extensive database of media accounts, does not represent all media outlets 
in the United States. In addition, the key terms used to identify homicide incidents may not 
identify all of those connected to the far-right. The use of skinhead and homicide or 
skinhead and murder will not identify incidents that may have been committed by a 
constitutionalist or an anti-government survivalist. If the missing cases are random, that is 
to say if there are no systematic reasons why one type of incident may be included while 
another is excluded, than the effect that it has on the data and the analyses using this data 
would be minimal. However, since the non-sampling errors make it impossible to compare 
the true universe of homicide incidents to the actual incidents collected, it is also 
impossible to estimate the affects that non-sampling errors have on the data collection 
process and that, in turn, prohibits a researcher from using statistical techniques to adjust 
for the bias. 

All of the aforementioned non-sampling errors can either cause an over count or an 
undercount of far-right homicide incidents, but decisions made by coders can also alter the 
true universe of interest. Once incidents are identified, trained coders examine open-source 
materials that are used to identify whether a suspect is a far-rightist or not. When weighing 
the open-source evidence and attempting to decide whether a homicide incident was 
committed by a far-rightist, the decision one coder makes may not be the same as another 
coder who is weighing the exact same evidence. This discrepancy may cause some coders 
to include incidents that others would exclude and vice versa. Coders can also cause non- 
sampling errors, not just in the identification of an incident, but in the collection and coding 
of data related to each incident (for a discussion of intercoder reliability see Gruenewald 
and Pridemore, this issue). It is the purpose of an error profile to identify all of the possible 
sources of non-sampling error and to explain them in a way so that future researchers will 
have the ability to understand the strengths and weaknesses of the data with which they are 
working. 

The creation of such error profiles is a positive step in improving the quality of data 
collected to study terrorism. Such profiles should be informed by a growing body of 
research from multiple sources related to the identification, collection, and processing of 
such data (see Chen et al. 2008). Finally, as the empirical and theoretical contributions to 
the understanding of terrorism continue to grow, there appears to be an equally important 
need for scholars to openly engage in dialogue about the methods and data used for the 
study of terrorism. 
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